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Abstract: In this paper, we examine the optimal quantization of signals for system identification. We deal with memoryless 
C^S , quantization for the output signals and derive the optimal quantization schemes. The objective functions are the errors of least 
■ squares parameter estimation subject to a constraint on the number of subsections of the quantized signals or the expectation 

(N ■ 

of the optimal code length for either high or low resolution. In the high-resolution case, the optimal quantizer is found by 
, ^ | solving Euler-Lagrange's equations and the solutions are simple functions of the probability densities of the regressor vector. 
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In order to clarify the minute structure of the quantization, the optimal quantizer in the low resolution case is found by solving 
recursively a minimization of a one-dimensional rational function. The solution has the property that it is coarse near the 
origin of its input and becomes dense away from the origin in the usual situation. Finally the required quantity of data to 
decrease the total parameter estimation error, caused by quantization and noise, is discussed. 

Keywords: system identification, quantization, networked control, least squares method, FIR model, entropy 

1 Introduction 



. The recent rapid improvement in the transmission capacity of computer networks has made long-distance automatic control 
more realistic and the necessity of understanding the effects of transmission limitations on the information in control systems 



has become more widely accepted. In particular, quantization of the signals to reduce the information content of the trans- 



it , mitted signals in control systems has been discussed actively by several control research groups during the last few years and 

■ interesting results have been achieved. 

O ■ 

• • , The problem of signal quantization has a long history going back to the 1940s, and is one of main themes in the area 

> ■ 

V~j ■ of information theory (e.g., see [13]). The problem is to attain low distortion between the original and the quantized signals 

t-H ' subject to constraints on the amount of information. Naturally, the situations and objectives for data transmission and those for 



control systems are essentially different and the need for research on the latter case has been recognized. However, although 
elementary discussion in the control community dates from the 1970s (e.g., see [5]), rigorous analysis did not begin until the 
late 1980s. The main difficulty of quantization in control systems lies in their dynamics; the result by [6, 7] is recognized 
as a breakthrough, in which the behavior of control systems and their stability or state estimation are analyzed in detail. In 
the last few years, stabilization problems of quantized systems have been actively investigated in several different situations, 
e.g., [26, 27, 3, 16, 8, 17, 23, 18]. Of these, a logarithmic quantizer was shown to be coarsest, in some sense, to achieve a 
kind of asymptotic stability [8] and reveal the variations in the importance of signals, depending on their magnitudes and the 
directions in the signal space, from the viewpoint of system control. 



"The technical report/conference versions of this paper are in [22, 21, 24]. 
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With this background, our interests naturally shifted to the system identification problem; that is, what quantization scheme 
is optimal for system identification? We expect that the answer to this question will clarify the amount of information in the 
signals necessary for parameter estimation. Unfortunately however, compared to the research activity in the stabilization or 
estimation problem, the optimal quantization problem for system identification [10] has not been adequately considered. The 
main subject of this paper is to answer this fundamental question. 

In this paper, we consider the optimal memoryless quantization problem of output signals that are used for parameter 
estimation. The identified system is a simple single input single output (SISO) finite impulse response (FIR) model, in order 
to reveal the essential properties of the optimal quantization in system identification and help intuitively understanding it. By 
optimality in this paper we mean the minimization of the variance of the parameter estimation error given by the least squares 
method with a constraint on the number of quantization steps or the expectation of the code length of the optimally coded 
quantized signals. We consider this problem for two cases: (1) high quantization resolution with weak assumptions on input, 
(2) low quantization resolution, however with some specific assumptions on input. The difficulty with the problem is in the 
complex correlation between the input signals and the quantization errors, and solving this is the key for the optimization 
problem. 

In the high resolution case (Section 3), we introduce a key concept, the density of the number of quantized subsections, 
and by using calculus of variations, analytic solutions are derived subject to the constraint on the number of quantization steps 
or the optimal code length. The solutions are functions of the probability density of the input signals and we can rigorously 
calculate the profile of the density of the number of the optimally quantized subsections. Moreover, these results suggest 
several insights into system identification with finite information. We illustrate these facts for some cases and describe the 
complexity of the problem of system identification. 

The results in Section 3 show that the quantization resolution around the origin of the signals relatively becomes coarse in 
usual cases. In order to clarify the minute structure of the quantization and complement the results in Section 3, we consider 
the low resolution case in Section 4. We give the optimal quantizer with a condition of uniform distribution of input signals. 
The optimal quantizer is given by minimizing a one-dimensional rational function recursively. In a special case, we show 
that the optimal quantization is not uniform and it is coarse near the origin of the quantized signals and becomes dense away 
from the origin. This fundamental property is opposite to the case of stabilization in [8] and reveals duality between system 
identification and stabilization. 

Finally, in Section 5, we compare the effects of the resolution of quantization and the I/O data length. The results show that 
the former is more effective for decreasing quantization error in the estimated system parameters, on the other hand, the latter 
is more effective in reducing noise error. From this, there exists a trade-off between these two error terms subject to a constant 
amount of data and we can find an appropriate quantizer resolution to balance them by using the results in Section 5. 

Note that the main purpose of this paper is to reveal the essential properties of the optimal quantization for system identifi- 
cation; therefore, the focus of this paper is on the analysis of this problem and not on practical system identification methods. 

In this paper, most of the proofs of theorems, lemmas, or propositions are collected in the appendix for ease of understanding 
the main theme and the outline of this paper. Refer to these in Appendix A if necessary. 
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Notation: 



df eq. (4) and (5) 

E[a;]: expectation of x, E,[x]: eq. (48) 
e(t) = y'(t) — y(t): quantization error at t 
e((f>i(t)) = e(t): quantization error specified by <f>\ 
f(x): probability density of x 
<?(•): eq. (25) 

#(•): entropy of •, H(;»), H d (»): eq. (38) 

j: index of quantized subsections 

M: number of quantization subsections 

M': associate number of quantization subsections (53) 

N: data length 

n: order of FIR model 

0(»), o(»): orders of • (Landau's symbols) 

P(»): eq. (90) 



Tj, r°: ratio or optimal ratio of dj and dj+i (54) 

5": j-th subsection on the space of • 

T: variable transformation matrix 

V[a;]: expectation of || x \\ \, V.[a;]: eq. (82) 

y(t) = 4>{t)6: output of FIR model at t 

y {t). observed output (1) 

6 E lZ n : parameter vector of FIR model 

4>(t): regressor vector eq. (1) 

(j>\ : 1st element of (j) 

<t(0i): eq. (27) 

•»> i-th element of vector • 
•': quantized number of • 

: j-th quantized number for S' 
•: transformed vector or matrix of • by T 



2 Problem Formulation 

The objective of this paper is to show the effect of I/O signal quantizers for parameter estimation error intuitively understand- 
able form as possible. In general, the quantization error has a strong correlation with the original signal, therefore, analysis of 
the quantization problem in system identification in general model is difficult because several types of correlation are used for 
parameter estimation. In order to derive analytic and intuitively understandable results for the quantization problem in system 
identification, we should formulate the problem in feasible forms appropriately. 

From the above observations, in this paper, we deal with a system identification problem by least square criterion for a 
simple discrete time SISO FIR model. The plant is: 

tfo(t) = q(y(t)) + w(t), y(t) = <f>(t)6, (1) 
0(t):=[u(t) u(t-l) ••• u(t-n+l)], 6 ■.= [$! 6 2 ••• 9 n ] T , 
y a , y,w,ueK,(f)e K lxn , 9 e K nx \ 

where w is random noise, q is the quantized original analogue output y, y is the observed output, (j) is the regressor vector, 6 
is a system parameter, n is the dimension of the FIR model, u is the input, and t is the time index. 

We assume that u and w are independent. The input u and the associated regressor vector <j> are a realization of a stochastic 
process with a joint density function f(<t>i,<f>2, ■ ■ ■ , 4>n) of <fii, <j>2, ■ ■ ■ , 4> n , where denotes the i-th element of (j). The class 
of f((f>i , (f>2, ■ ■ ■ , 4> n ) considered in this paper is described below. 

Note 2.1 We also consider noise to be 

y {t) = q{y{t) + w(t)) (2) 
in [24] (the long version of this paper). The result suggests that the noise when (2) increases the effect of quantization on the 
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magnitude of the parameter estimation error by approximately twice that of (1). From that result, it is enough to analyze the 
form of (1) in order to know the essential property of the optimal quantization. To avoid complicated notation and focus on 
the quantization effect for system identification, we treat the plant (1) in this paper. (} 



The quantizer q is a memoryless symmetric type defined by: 

l(y)-= y\j) when y E SJ (3) 

S% := {y = 0} , S] := {y : d,-_i < y < dj} , j > 0, S] := {y : dj < y < d j+1 } , j < (4) 

do = < di < d 2 ■ ■ ■ , rf-i = —d\, d-2 = —d 2 , ■ ■ ■ , (5) 



where y'^ is the assigned quantized value to the subsection <Sj. The quantizer q is symmetrical with respect to the origin, and 
hereinafter we may omit references on the negative subsections S^_ lt S^ 2 , ... if they are obvious from the context. Note that 
a form Sq = {y : —d\ < y < di} is also possible for 5q , however it is clarified not to be optimal in Section 4 and without 
loss of generality, we consider the form of (4) hereafter. 

Following the standard least squares method, we propose the estimated parameter 9 with a sufficient length of I/O data, 
{u(t)} and {y (t)}, as: 

9 = {XFuyWYo = (f/ T f/)- 1 ?7 T (V + W) = {^Uy 1 ^ (Y + E + W), (6) 

where 

U:=[<j)(l) T 0(2) T ••• 0(iV) T ] T , W := [w(l) w(2) ••• w(N)] T , 
Y :=[y (l) Vo {2) ■■■ y (N)} T , Y :— [y(l) y(2) ••• y(N)f , 
Y':=[y'(l) y'{2) ■■■ y'(N)f , y'(t) := q(y(t)), 

E:=[e(l) e(2) ••• e(iV)] T , (7) 
e(t):=y'(t)-y(t). (8) 

and N is the I/O data length. We call e as the quantization error between y' and y. The estimated parameter 9 can be also 
written as: 

9 = {U T U)- 1 U ri {U9 + E + W)=9 + AE + AW, 
E:=[e(l) e(2) ••• e(iV)] T , AE :~ (U 1 ~Uy 1 U' T E, AW :— (U T U)~ 1 U T W. (9) 

This shows that the estimation error 9 — 9 can be evaluated from the magnitudes of the quantization error term AE and the 
noise error term AW. 

In the quantization-free case, i.e. e = 0, (6) is the standard least squares estimation. When e ^ 0, (6) is still a realistically 
reasonable estimation subject to the minimization of 

m&Ewt] do) 

because 

E[\\§ 9\\l\ = E[\\AE + AW\\j] = E[||A£|||] + E[||AW||1]. 
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The reduction of the noise error term AW is the main theme of normal system identification and has been well investigated. 
On the other hand, although the quantization error term AE can be reduced, in general, when the resolution of quantizer 
becomes high, there exists a limitation in the reduction because of the constraint of the resolution of the quantizer and good 
quantizers for reducing AE are expected. Here we show an original quantization problem in this paper which is resolved into 
feasible ones in Section 3 and 4. 

Problem 2.1 Find an optimal quantizer q(y): 



under constraint on the quantization resolution 
Note that the latter condition is for bias-free of the estimated parameters. 

Note 2.2 In the field of information theory, the quantization problem is also one of the research themes and its objective is 
reducing the distortion between the original signal and the quantized signal subject to constraints on the information in the 
transmitted signals [1, 15, 11,2, 9]. The constraint on the information in signals can be given by the number of the quantization 
steps or the mean code length of the associated code. The former is called "fixed-rate quantization" and the latter "variable- 
rate quantization". In contrast, the purpose in system identification should be the reduction of the estimation error and this is 
the definitive difference. ^> 

In an ordinary probabilistic framework, a conventional, and reasonable, method to evaluate the noise error term AW is to 
show the convergence rate of: 

N^U)- 1 \l, ±-U T W N -=^ O, 

® u 

where a 2 a is the covariance of u, by using Slutsky's theorem (see Appendix A), subject to an assumption of the mutual 
independence of the input signal u and the noise w. This methodology is also basically applicable to the evaluation of AE in 
the probabilistic framework. However, different from the case of the noise error term, u and e are not independent in general 
and the evaluation of U T E is much more complicated. This means the problem seems to be a vector quantization on U T E 
with a complex multidimensional distribution. In general, multidimensional optimal quantization is known to be a difficult 
problem for analytical solution except in special cases. 

Our idea to resolve the above difficulty is in showing that the original problem, i.e., minimizing the cost function on 
the magnitude of AE, can be reduced to a feasible problem; "minimization of a functional of a weighted one-dimensional 
quantizer," by following two steps: 1. finding an equivalent orthogonal quantization on the space of the regressor vector to the 
original quantization of the output signals, 2. reduction of the cost functions to a suitable form by using one of the base axes 
in the regressor vector space. Step 1 is described in this section and Step 2 is described in Section 3 and 4. 

We define subsets S? of the regressor vector associated with the subsection <Sj by: 

Sf :={<!> :y = We S%}. 
5 



minE[||A£7|||] 

9 

s.t. E[AE] = (11) 



We also consider the following variable transformation: 

y = c/)6 = (t>T-T- 1 6 = 4>6, 8:=T- 1 6 = 



(12) 



where T is an orthogonal matrix. Note that such T always exists for any 9. Then, Sj is represented by: 

' {</>: <Mi G (dj_i, dj]}, j >0, 
{^l = 0} , .? = 0, 

_ {^ : 0i#i G [dj, d j+ i)} , j < 0. 



We also define subsections on the space 0i: 



{0i : 0101 G (dj- u dj]j , j > 0, 
{01 =0}, .7=0, 
^ {01 : 0101 G [dj, dj+i)}, j < 0. 



Then, subsections <SJ, Sf, and S^ 1 correspond to each other, and the probability distribution of y depends only on that of 0i. 
Therefore, the variable 0i and its subsection Sp are convenient for analyzing the probability distribution of y and the error e. 
Fig. 1 and Fig. 2 are representations of the relationship between <Sj, Sf, and Sf 1 or y, cf), and 0i. 




Fig. 1 Diagram of the relationship between <Sj and S? 
for n = 2 




Fig. 2 Diagram on the relationship between Sf and Sf 1 

for n = 2 




Fig. 3 Quantization on c\> (or 0) for n = 2 



Associated with T, the quantization error term AE and £/ are also transformed to: 

AE := T^AE, U := UT 



(13) 
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and AE can be represented as: 



AE 



T" 1 (C/ T [/)" 1 [/ T J B = {tj' T U)- 1 U' T E 



= (u^uy 



•TfV\-l 



(C/ T C7) 



(14) 



Note that ||A_E|j| = ||Ai?||| because T is an orthogonal matrix. From the above, it is known that the quantizer can be 
considered to be an orthogonal and symmetric type along each axis <j>i in the sense that each axis </>j is partitioned in the same 
rule (see Fig. 3). 

In Sections 3 and 4, we first derive key lemmas, respectively, to show that the quantity ||A_E||2 = ||A_E|j| can be repre- 
sented as a functional of the one-dimensional marginal density function /(</>i) and the quantizer on (pi, subject to appropriate 
assumptions. 



3 High Resolution Quantization 

In this section, we derive optimal quantizers under considerably weak conditions on the probability densities /(</>) where the 
quantizers are assumed to be high resolution. At first, we show the following assumption: 

Assumption 3.1 The input u and the density function /(</>) satisfy the following conditions: 



1: u(t), t — ...,1,2,... are mutually independent. 
2: f{4>) is a continuous function s.t. f((f>) satisfies: 

/(0) = *o+$>&-#)+5>«w 



- 4>°) + 0((& - 4>°)(4>j - mfa - €)), \6.\<oo 



(15) 



in the neighborhood of an arbitrary <f>° = [<f>° <j)\ ■ ■ ■ <^>°] G {</>}. 



These conditions are not strong in usual setting of system identification. In particular, the essence of (15) is for guaranteeing 
the continuity of f(<fi) and it is usually satisfied; e.g., (15) is satisfied when f((f>) is a multidimensional normal distribution. 
This technical condition is used in the proof of Lemma 3.1. 

The first Assumption 3.1.1 gives the convergence of j^U T U or j^U T U to where a\ is a covariance of u, at N — > oo, 
and therefore, 



N\\AE\\%(=N\\AE\\i) - tr 



plim ( -L{J T UU T U) plim ( — U T EE T U 



1 



plim 

0~u N^oc 



N 



E T UU T E 



by Slutsky's theorem (see Appendix A). Moreover, we get: 

1 



plim 



N 



E T UU T E 



N 



U V E 



(16) 



(17) 
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therefore, 



T V[U T E] 



at enough large N. Then, it is reasonable to find an optimal quantizer that: 



(18) 



1) minimizes V [U T E] (= V U T E ) 

2) subject to constraints on the resolution of the quantizer, free of bias from the quantization error term, such as: E [C^ T ^] 
(equivalently 



U T E 

The minimization of V [C/ T -E] in arbitrary resolution cases of the quantizer is too complex to expect meaningful results, 
however, it is possible to derive the analytic solution in high resolution as shown in the following of this section. 



Note 3.1 The multidimensional optimal quantization problem has been investigated (e.g., see [13, 12, 19, 9]) and the research 
focus is on the derivation of analytic solutions. In the general resolution case, it is known to be a difficult problem and limited 
cases have been solved. One of these is the case of one-dimensional quantization and another is the asymptotic case when 
the resolution of quantizers is sufficiently high. Note that cost functions are E[\\X — q(X)\\ r ] in these studies. However, we 
consider the cost function E[|| J/ 1 ^^] in this paper, which originates in system identification parameter estimation. The eval- 
uation of the latter is much more complicated because it contains many correlations of variables and resolving this difficulty is 
one of main themes of this paper (Note that the latter is not simple weighted square-error distortion because of the correlation 
between tpi and e = 4>\9\ — q(<pi8i)). The key lemmas (Lemma 3.1 and 4.1) show that this quantity can be represented as a 
functional of one-dimensional functions with one-dimensional quantization rules under appropriate assumptions and, by using 
them, we can find the optimal quantizers. ^> 



On the above minimization problem, the bias-free condition E [t^ T -E] = is equivalent to E 



U T E 



= from the relation 



U E = T U E, where T is nonsingular and orthogonal. From (14), this condition is equivalent to 



N 



t=i 



N ■ E 



4>k ■ e((f>i) = N / 4> k e(4>-t)f{4>i,<j)k)d4>-id4>k = 



(19) 



for k = 2, 3, . . . , n and 



5^i(t)e(t) 



= N ■ E 



h-el 



(0i)] =N J 0ie(^i)/(0i)d0i = 



(20) 



for k = 1. Note that we use the notation e((j>i(t)) when we intend to specify that e(t) is a function of 4>i(t), which can be 
seen from (14). The notation /(</>i) represents a marginal density function: 



f{4>l) '■= / f{4>\,4>2, ■ • ■ , <t>n)d<t>2 ■ ■ ■ d(f> n . 



(21) 



The notations f(<j)i, <j)j), f((f>i,(f>j,(j)k), ■ ■ ■ are similarly defined. 

With the continuity condition of f{4>) in Assumptions 3.1.2, (19) and (20), i.e., the bias-free condition E[U T E] = 
(e[U t E] =0^, are asymptotically satisfied as the widths of the quantization steps tend to with the setting of y'^ at 
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the center of the quantization subsections. On the other hand, for the cost function V[U T E] {= \Z[U T E]j, which can be 
represented by 



V[U T E]^y[U T E}) = (j>(t)e(t)J 

k=i \t=i ) 

we derive the following key lemma. 

Lemma 3.1 Assume that f((j>) satisfies (15) in Assumption 3.1.2. Then, 



k=l 



N 



(22) 



\t=i 



Ay max ^0 



NE 



(23) 



where Ay max is the maximum width of the subsections Sj of the quantizer defined by Ay max := max.,- \dj + \ — d 

The proof of this lemma is given in Appendix A. 

From this lemma, the cost function V[U T E] (= \/[U T E]j can be approximated by: 



V[U T E] (=v[u t e\) — > Jv£E[$e 2 ^i)]=JV£ / 4>le 2 (4>i)f(4>uh,...,4> n )d4> 

ymax ^ fe=i fe=i J 



>\ay>2 ■ ■ ■ utpn 



N 



j {j it^&kfifaifc, ■ ■ ■ ,4>n 



)d(f>2 ■ ■ ■ d4>„ e 2 ((f> 1 )d(i> 1 . 



(24) 



in the high resolution case. Therefore, the focus of the problem is on the calculation of the r.h.s. of (24) for general /(</>) 
and its minimization. A key concept in solving this problem is the introduction of the following quantity in the distribution of 
quantization subsections, which is a reasonable concept in the high resolution case. 



Definition 3.1 The quantity g(<fri), which satisfies 

g(<fri)d<f)i = number of quantized subsections in d(f)\, 
is called the density of the number of quantized subsections. 



(25) 



This quantity is the same as that introduced in [1, 15] and from this definition, g(4>i) 1 represents the width of the quantization 
step at (f)i . 

We also assume a form of smoothness of f(<f)) and g(4>\) in the following. 
Assumption 3.2 The density function /(</>) and g{<pi) satisfy the following conditions: 



1: f(4>) is a continuous function s.t. 

#1 



< oo, 



*(0i):= (f^ 1 J ^^/^'■•■'W*'''*) . 



(26) 
(27) 



where f(<f>i ) is the marginal density function on the space of (f)\ defined by (21 ). 



2: the resolution of quantizer is sufficiently high and the density g((j)\) satisfies: 



#1 



< 00. 



Note 3.2 The essence of Assumption 3.2 is the smoothness of f(<j>i) and g((pi) such as they guarantee the approximation of 
(24) in the following. Assumption 3.2.1 describes a form of the continuity of f((p) or f(<fri) and it is not a strong assumption 
in the usual situation of system identification; e.g., f(cp) or f(cf>) in C 1 is enough and it is satisfied when they are multidimen- 
sional normal distributions. Assumption 3.2.2 also describes a form of the continuity of the quantizer and g((j>i) or g(y) e C' 2 
is enough. Such technical conditions come from our intention to make the necessary conditions for deriving (28) weak as 
possible. 

With Assumption 3.2.2, we can select a value gj 1 <~ gi^i)^ 1 for the subsection Sp that satisfies gj 1 = \Sp |. Moreover, 
with <r((j>i) of /(</>) at (f)i defined in (27), Assumption 3.2.1-2, and A(j> :— max., 0f 1 \dj+i — dj\, for the objective function 
(24), we calculate the following directly: 

(24)/JV - J a 2 (0i)e 2 (0i)/(^i)#i = 9l J l 5 (0 1 )- 2 a 2 (^ 1 )/(0 1 )# 1 + 0(A$). (28) 

See Appendix A for the derivation of (28). From this, 



1/2 j jr,^ ^VM/U,,)^, (29) 



is considered to be a reasonable cost function when Assumption 3.1 and 3.2 are satisfied. 

In the following we assume Assumption 3.1 and Assumption 3.2 and give the optimal quantizers, which minimize (29), 
subject to a constraint on the number of quantization steps (Section 3.1) or on the expectation of the code length, where the 
quantized data is optimally encoded (Section 3.2). The former case is referred to as "fixed-rate quantization" because it is 
identical to a "fixed-code length" case; the latter case is referred to as "variable-rate quantization" and the code length is not 
fixed. 

3.1 Fixed-rate quantization 

From the previous derivation, the original optimization problem of (24) can be replaced by the minimization of (29) in — > 00 
and the high resolution case: 



Problem 3.1 Find 



gf(4>i) := argrniny F{g{(t>i ))djj>i (30) 
s.t. { - M, (31) 



where 



Hg(4>i)) ■■= -^flMi) 2 cT 2 (4> 1 )f(4> 1 ). (32) 



The following theorem gives the solution of this problem: 
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Theorem 3.1 The solution of (30) is: 



2 ~ . „ 1 ~ 



gt(<h.) = KaH^fHfa) (33) 

K = D- X M (34) 

D = J a 2 H4>i)fHti)d4>i- (35) 

Moreover, the optimized value is given by: 

J HdMWi = ^9lD 3 M- 2 . (36) 

The minimization problem can be rigorously solved by applying the calculus of variations. See Appendix A for the proof. 

From Theorem 3.1, the asymptotic optimal quantization at high resolution is readily calculated analytically, or numerically, 
if the marginal density functions f{<f>\) are known. 

Note 3.3 The optimal quantization scheme on y (call it as gt(y)) is also given by using the above results. With the relation 
y = <\>\B\ and the fact that the optimal gf(<fii) is given only by f{4>\), gi(y) on y is a simple scaling of gi(<pi). Therefore, 
gt(y) on y is given by; (i) using the knowledge of §i and gf(0i), or (ii) f(y) on y such as gf(y) = K'ai (y)fi (y), where 
f(y) is obtained by the observation of the output data {y(t)}. The situation (i) is a standard problem setting of control systems 
under limitation of channel capacity, where the quantizer (encoder) is supposed that it can fully utilize information on systems 
in order to optimally compress the data. The situation (ii) is also a natural problem setting. 

Example 3.1 When /(</>) is a multidimensional normal distribution: 

1 /It 



f((t>i,(fo, ...,4>n)= — ~7r~ — — -exp(-^ T r 1 4> , r = diag(cr ,cr , . . . ,(T ), 
(2tt) 2 (dctr)2 V l J 

where T is a covariance matrix of <f>, then 

a 2 (4> 1 ) = 4>i + (n-l)a 2 . 
For simplicity, in the case that the order n of the FIR model is sufficiently large, 

«7 2 (0i)/(^i)~n^/(0 1 ). 

Therefore: 

Denial J giijfo) ~ M (^J /I (0i)#i) 

J ^(fff(^i))#i - -^0? ^ n<7 2 M" 2 = ^e 2 6V3^na>/- 2 - 0.8658^ 2 n^M^ 2 . (37) 

Example 3.2 Here we consider another simple case n — 1, where the cost function becomes 

V[C/ T £] =iV / 0?e 2 (0i)/(^i)d&. 
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Then, the optimal quantization g f (4>i) for this is given by 



g { (4> 1 ) = K0lp(<j> 1 ), K = D~ 1 M, D 







We illustrate g i (cpi) for the cases where a 2 (4>i) — <p\ + a 2 and f(<fti) is the uniform distribution, normal distribution, or 
power law as follows. 

Fig. 4 is the case that f(<fii) is the uniform distribution. From the figure, we observe that the optimal quantization is coarse 
near the origin of <p\ and dense near the boundary of the domain of (pi . Theorem 3 . 1 shows that the increasing rate of resolution 
with enough large cpi is about (j>l . 

When f(4>i) is the normal distribution, the profile of the density f(<fti) near the origin is flat; therefore, the optimal quantizer 
must have a similar profile to that where </>i is the uniform distribution near the origin. We can see such a profile of g f (</>i) 
in Fig. 5. This property is, in some sense, the dual result to that of the quantization problem for stabilization by [8]; that is, 
the coarsest quantization scheme for stabilization is dense near the origin and becomes coarser as distance from the origin 
increases. These observations suggest that there appears to exist a trade-off between parameter estimation and stabilization 
in the quantization scheme for a type of adaptive control system. On the other hand, in the area of the tail of f{4>\), g f (4>i) 
decreases. However, contrary to our intuition, the resolution remains high, e.g., g { (3) <~ 0.208 ~ 45% of max c/ f (cf){) or g { (4) 
<~ 0.0774 <~ 17% of max g f (<j>i), where f(4>i) is sufficiently small. 

Finally f(<j>i) ~ </>j~ 2 at the tail of the distribution is an example of a power law. In this case, g f is constant in the tail and it is 
marginal for the solution's existence (see Fig. 6). This result shows the difficulty of system identification at sufficient accuracy 
by using finite information from the system when the tail of the probability density function f{4>\) is heavier than 0(<f>i 2 ). 
That is, this explains the complexity of the power law from the viewpoint of parameter estimation in system identification. 



gf(4>i) 




Fig. 4: Probability density f(<f>i) of the regressor (solid 
line) in uniform distribution and the density function 
of the number of the optimally quantized subsections 
g f (4>i) (dashed line) when a 2 (4>i) — 4>\ + a 2 



Fig. 5: Probability density /(</>i) of the regressor (solid 
line) in normal distribution and the density function 
of the number of the optimally quantized subsections 
g f (4>i) (dashed line) when a 2 (4>i) = 4>\ + a 2 
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Fig. 6: Power law (0(^ 2 )) f{<j>{) of the regressor 
(solid line) and the density function of the number of the 
optimally quantized subsections g f (<fti) (dashed line) 
when ct 2 (0i) =02+0-2 



Note 3.4 As known from Fig. 4 and Fig. 5, when f((f>) is the normal distribution, uniform distribution or other probable 
distributions in usual situation of system identification, the marginal density f(<pi) is approximately flat near the origin and 
the quantization becomes coarse in such subsection. Therefore, in order to clarify the minute structure of the optimal quantizer 
around the origin, we should consider the problem in the coarse resolution with a flat marginal density f(<fii). Such case is 
rigorously analyzed in Section 4. ^> 

3.2 Variable-rate quantization 

The previous subsection presents the optimal quantizer to minimize the identification error (24) (i.e. (29)) subject to a con- 
straint on the number of quantization steps, i.e., fixed-rate quantization, with high resolution. On the other hand, to reduce 
the information in the observed data, it is reasonable to apply variable-rate coding for the quantized signals and evaluate the 
mean code length from the information theoretic viewpoint. From this observation, we consider the minimization problem of 
(24) (i.e., (29)) subject to a constraint of the expectation of the optimal code length in this subsection, that is, variable-rate 
quantization, with high resolution. 

Let C(-) be an encoder that is a mapping from source alphabets to code alphabets and l(-) be the code length. We regard 
the quantized output q(<j>i) as the corresponding source alphabets, then, 1{C (q(<j)i))) represents the code length of q(<fii). The 
expectation of the optimal variable-rate code length for a quantized signal is related to the entropy of the source alphabets by 
the following well-known source coding theorem. 

Proposition 3.1 [20, 4] Let x be source alphabets, then: 

E[l{C{x))]>H{x), 

where H(x) represents the entropy of x. 

With this proposition, the optimization problem of the quantizer for the code length is reduced to the minimization problem 
of (24) (i.e., (29)) subject to a constraint on the entropy of the quantized signals. 
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The basic concept for representing the quantizer with high resolution is the same as that of the previous subsection. That is, 
subject to Assumption 3.2.1 and 3.2.2, we obtain the asymptotic approximation of the entropy of the quantized signal: 

J2 -pj logPi ~ E - J sb ~ / -/(0i) log (ftik)g-\ikj) #1 

= H d (f) + j ' -f{^)\og{g-\^))d^=:H{f,g), (38) 

where H d (f) := j — f(4>i) log/(^i)d^i. By using this asymptotic approximation of the entropy (38), we consider the 
following problem. 



Problem 3.2 Find 

g v (<t>i) :=argrnin / V(s(0i))#i (39) 
s.t. H(f,g) = log M, (40) 

where T(-) is defined in (32). 

Note that M is the expected number of quantization steps in the sense of (40). We can derive the following theorem: 

Theorem 3.2 The solution of (39) is: 

0v(^i) = KMa(4>i) (41) 

K = expL (42) 

L := -ff d (/)- //logtr(^)#i = //(^log^TT^i- < 43 > 
Moreover, the optimized value is: 



j i ... 1 a-> r.- 2, f 2 



J-( 5v (^i))#i = — e{K-'M~\ (44) 



The proof is in Appendix A. 



Note 3.5 It is interesting that the optimal g v is a simple linear function of o{§\). The constant coefficient is also linear with 
respect to the number of expected quantization steps M. On the other hand, the convergence rate of the minimized cost 
function is M~ 2 ; this is in common with the fixed-rate quantization. ^> 

Example 3.3 When is the density function in a multidimensional normal distribution and n is sufficiently large, as de- 
scribed in Example 3.1, 

g v (^) = KMa(fa)~M-exp(-H d (f)) 
yV(3v(0i))d0i ~ ^^cxp(2^ d (/))na2M- 2 = ^^2e^n ( T4M- 2 ^ 0.4533^ nCT 4 M -2^ (45) 

By comparison with (37) and (45), it can be seen that variable-rate optimal coding achieves approximately half the magnitude 
of the square of the quantization error compared with for fixed-rate quantization. (} 
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4 Quantization in Coarse Resolution 



In the previous section, we give the optimal quantization in high resolution for general probability densities of input signals. 
The results are enough for understanding the profile of the optimal quantization, however, as explained in Note 3.4, its minute 
structure around the origin is not clear in the case of coarse quantization. In this section, we do not necessarily suppose high 
resolution of quantization and derive the optimal quantization, however, under limited assumption as follows. 

Assumption 4.1 /(</>) is a probability density function such that f{<j>) is uniform distribution in <f>i e [—k, k] with a given k 

(e K) > o. 

The optimization problem under this assumption has clear significance for the following cases: (1) to clarify the minute 
quantization scheme around the origin of y because the profile of the multidimensional probability densities of usual input 
signals in system identification, e.g., normal distribution, is flat around the origin. In such subsection, the quantization is 
comparatively coarse and the probability density can be approximated as a uniform distribution. The important fact is that 
such property of the flatness of the probability density around the origin does not depend on the choice of the base in the 
space of <f>. This means the condition of Assumption 4. 1 is always satisfied around the origin in usual situation of system 
identification. (2) to consider the first order systems where input signals obey a uniform distribution. In this case, the analytic 
optimal solution in coarse quantization can be given and it is enough for the main subject of this paper to clarify the essential 
properties of the optimal quantizers for parameter estimation. 

When Assumption 4.1 is satisfied, as similar to the case of Section 3, j^U T U and j^U T U also converge to a^I when 



N — > oo, then the optimal quantization problem is also reduced to minimize V [U T E] ( 



= V 



U T E 



of (22) subject to a 



= ,i.e. (19) and (20). 



bias free condition: E [f/ T i?] = ^equivalently E 
Under Assumption 4.1, it is obvious that 

J 4>kf(4>i,4>kW k = o (46) 

for k ^ 1, then, (19) is automatically satisfied. Therefore, the bias-free condition is reduced to (20). Moreover, (20) means 



M0i)/Wi.02,-.-,0n)d0i=O (47) 
under Assumption 4. 1 . A sufficient condition for (20) is 



M<M := / - 0ie(0i)/(^i)#i = / s Ml/u)-0i<h)f(<h.)d4>i=0, Vj. (48) 



This condition is sufficiently reasonable for the representative number y'^ of the subsection <Sj (or the corresponding Sf 1 on 

0i). 

On the other hand, we can derive the following key lemma for the cost function V[U T E] (= V [U T E]j of (22): 
Lemma 4.1 Subject to the conditions: 

J 4>hf(4>i, ■ ■ ■ , k, ■ ■ ■ , 4>nWh = 0, Vfc = 1, 2, . . . ,n (49) 
and J 0ie(^i)/(0i)d0i - 0, (50) 
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N 



\t=i 



N f 4>ie 2 (4>i)f(4>i)d4>i fork = l 

Nffte^fofWufaWidtk for k^l 



(51) 



is satisfied. 



The proof of this lemma is in Appendix A. 

Assumption 4.1 automatically guarantees the condition (46), i.e. (49), and therefore with the bias-free condition (50), (51) 
follows from Lemma 4. 1 . With these preliminaries, we formulate the problem considered in this section: 

Problem 4.1 Let M be the number of quantized subsections Sj of[—K y , n y ] := [— k6\, k6\] on y (i.e., Sj 1 of[—n, n] on 
(j>i ) where M > 2. For the system (1) with Assumption 4. 1 and a fixed M, find a quantizer q that minimizes 

< N \ 21 



V 



[U T E] (= 



V 



U T E 



7L 

)=£' 



fc=i 



\t=i 



= N J o*{fa)e\fa)f{fa)dfa (52) 



such that E , 



0ieW>i) 



= for all j. 



The reason for the constraint M > 2 is described in Note 4. 1 . 

As described in Section 2, the quantization scheme of [— k6i, k6\\ on y is essentially equal to that of [— k, k] on <pi and it 
is completely defined by the setting of the subsections S_ M> , • ■ • , S_\, Sq 1 , Sf 1 , S^ 1 , . . . , Sf},, where 

for even M (> 2) 



and the assigned quantized values 



M' := 



q(y)\ 



\M 



\(M-\) for odd M (> 3) 



(53) 



yes* 



V<j) 



for each subsection Sj or Sj (see Fig. 7). Therefore, optimization of the quantization is reduced to a minimization problem 
of V\U T E] of approximately 2M-variables (ci_(M'-i)> • ■ •> ^M'-i and y'^_ M ,y ■ ■ ., 2/(m')> note tnat = K ^ an< ^ d-w = 

-K§!). 
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Fig. 7 The quantization scheme of g 

In this section, we consider the case of even M. The case of odd M, that is, S% ^ {0} (Sfi 1 ^ {0}), is reduced to the 
even case and the reason is explained in Note 4.1. We also refer to the positive domain Sf, <S|, ... because of quantization 
symmetry. 

It is known that when a subsection <SJ is fixed (i.e. dj_i and dj are fixed), y',^ is given by the bias-free condition 



0. Therefore, the optimization problem is reduced to finding optimal ci_(M'-i)> • • ■> Am'-i- Corre- 



sponding to dj, we introduce key variables, ratios Vj (J = 1, . . . , — 1) between dj and dj + i defined by: 



(54) 



Note that determining optimal d_(M'-i)> ■ • •> c£m'-i is equal to determining optimal r_(M'-i). ■ • •> rw-x and we derive the 
following result. 



Proposition 4.1 The optimal ratios r° for Problem 4.1 are given by solving the following recursive optimization problem 
iteratively. 



r° = arg^mm {d) +1 ^{r^fi\) + 20n 2 y (n - l)d 3 j+1 ^r;^)) 

ip(r; a) := ar 5 - 18(1 - rf + 45(1 + r) 2 (l - rf + 5(1 - r) 7 (l + r)~ 2 

^ min :=^(r?;VH) 

V>™ in := 32 



£(r;a) := ar 3 +3(1 - r) 3 + 
Co™" == 4- 



(1+r) 2 



(55) 
(56) 



(57) 
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The optimal value of (52) is given by 

M' 



/ \ ™ AT 

minV [U T E] ( = minV [U t e\ J = nun ]T V fifl [t/ T £j - -—^(Vfc + 20(n - 1)^) 

j=-M' 3 



= ^So^ 4 ^"- 1 + 20(n-l)^_!). (58) 

See Appendix A for the proof. 

Note 4.1 For odd M, there must not exist a subsection <Sq (i.e. S^ 1 ) of nonzero width that contains the origin of y (i.e., origin 
of 4>i) because for any such subsection and setting y', Q) , E ^ </>ie(0i) ^ 0. This means that Sq (i.e. 5q ') should be {0} 
and consequently the problem is equal to the case of even M with the setting M' — \(M — 1). ^> 

Example 4.1 Consider the following second-order FIR model as an example of (1): 

y{t)=e 1 u{t) + e 2 u{t-l), (59) 

where 9\ = ^ and 9 2 = \ and the system is noise free. We generate 50 sets of I/O data sequences with a length N — 10, 000 
for the system (59) that obey Assumption 4.1 and k = 4 (i.e., k v = 4). Fig. 8 is one of the histogram of 10, 000 samples of 
4>i from 50 sets. 

Next, quantize the output data y with the optimal quantizers given by Proposition 4.1 and with uniform quantizers, for 
comparison, subject to the constraints M' = 5 (M = 10). Fig 9 shows the step function q for y of the optimal quantizer for 
M' = 5. Fig 9 indicates a basic property of the optimal quantizer, that is, it is coarse near the origin and becomes denser away 
from the origin. 

The bias term J2tLi 4>i(t) e (t) an d tne quantization error term AE were calculated; Table 1 shows a summary of the 
results. From Table 1, the optimal quantizer, which minimize \Z[U T E] attains a lower ||AS||| than that of the uniform 
quantizer. 
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Table. 1 The ratios of the biases and the squares of errors for M' = 5 (averages of 50 sets) 



1 Et°°i 00 01 (*) e (*)l b y °P L q uant - 1 1 Et° i° 0i(*)e(i)| by unif. quant. 


0.1107 


|| AE ||| by opt. quant. / || AE\\\ by unif. quant. 


0.0132 







Proposition 4. 1 shows that the problem is in a category of the typical dynamic programming and we can solve it by numerical 
calculation. In general, the computation complexity of this problem is high; however, the optimization problem (55) can be 
solved by very few calculation steps in special cases n = 1 or n » 1, respectively, as shown in the following theorem: 

Theorem 4.1 When n = 1, the optimal ratios r° for Problem 4.1 are given by solving the following optimization problem 
iteratively. 

r° = arg min ^(r^H) (60) 
J re [o,i] J 

Vf n :=V(r^H) 

4>g in := 32. (61) 

The optimal value of (52) is given by 

min V [U T E] (= min V [u t e] ) = ^^VST-i- (62) 
Similarly, when n ^> 1, the optimal ratios r° for Problem 4.1 converge to the solution of the following optimization problem. 

r? = arg min £(r;£H) (63) 

r£ [0,1] J 

C min := 4. (64) 



77ze optimal value of (52) converges to 



^eW(n- (65) 



Note 4.2 The definitive difference of the optimization problems (55) and (60) or (63) is that in the former case, r° depends on 
dj + i and this requires a complex calculation such as dynamic programming, on the other hand, in the latter cases, r° does not 
depend on dj + i and {r°} can be given by solving (60) or (63) from j = 1 to j = M' — 1 in turn only once. This means that the 
original minimization problem of approximately 2M-variable function V [?7 T -E] can be reduced to a recursive minimization 
problem of a single one-variable rational function when n = 1 orn > 1. Moreover, when n = 1, from Lemma A.l in 
Appendix A, the local minimum of <f)(r; a), a > 0, in r e (0, 1) is unique. Therefore, finding the minimizer does not require 
a highly complex calculation. (} 

In the following of this section, we focus on the case n = 1 because it is a basic problem and reveals typical property of the 
optimal quantization. We call the optimal quantization scheme as Q opt hereafter. 
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Every optimal ratio r° can be explicitly determined by solving (60) -(61) iteratively ; however, the properties of the sequence 
r°, r%, ... are not clear from (60) - (61). For the asymptotic characteristics of the optimal ratios r? (j = 1, 2, . . .) and related 
quantities, we derive the following series of Lemma 4.2 - 4.5. 



Lemma 4.2 The optimal ratios r° satisfies: 



r°<r° +1 , Vj>0, 




1, j -> oo. 



Lemma 4.3 The width of the subsections 6>J or S® 1 ofQ opt satisfy: 




I, Vj > 0, 



vv/zere | • | denotes the width of the subsection. 

The proofs of these lemmas are in Appendix A. 

Lemma 4.3 shows that the optimal quantization scheme Q opt has the property that it is coarse near the origin of y and 
becomes denser as y tends to the boundaries of [— k v , k v ]. This property coincides with the results in Section 3 and it is also 
the dual result to that of the quantization problem for stabilization by [8] as mentioned in Section 3. 

Next, consider the unboundedness of Jl^li To - If it i s bounded and n^li 7^ — 7 < °°> tnen this causes a contradiction as 

3 3 

to the optimality of Q op t, that is, when a region [—7, 7] of cf>i is quantized, the width of Sf 1 , for example, is never smaller 
than 1 even if the number of quantization levels increases to infinity. Of course, this is not true and YVjLi ~^ i s therefore 

T 3 

unbounded. The next lemma strictly describes this fact. Refer to [24] for the proof. 
Lemma 4.4 The optimal ratios r° satisfies: 




From Lemma 4.2 to Lemma 4.4, we know the outline of the quantization of the region [—k v , n y \. 

Next, to clarify the profile of V [t^ T -E] with respect to M', the following lemma confirms the asymptotic characteristics of 



Lemma 4.5 The minimized quantity ipf 1 ™ of (56) at j = M' 



converges as 



**(M'), M' ^ oo, 



where a = — 5 • 3 2 and b = |, and ^{m) is a function of integer m defined as the solution of the following recurrence 
formula with an appropriate initial number V>(0) = ?/v 



V>(m) — ijj(m — 1) = aijj b (m — 1). 



(66) 



The proof is in Appendix A. 
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Note that the recurrence formula (66) is from (90) in Appendix A, and it can be approximated by ip, which is a solution of 
a differential equation: 



dip(m) 
dm 



= (a + iy)tjj b (m) > a^ b (m) + o(^ h (m)) = a^ b {m) + O(-0 2 (m)) = V{i){m)), m e K, 



where V{») is defined in (90) and v > is an appropriate constant number satisfying a + v < and the above inequality 
(such v always exists). We can show ip(m) > ip(m) at sufficiently large integer m when ip(0) > ip(0) in Lemma A. 2. Then, 
we obtain the solution 

4>(m) = {(-&+ l)(a + v)m + B}^ (67) 
for an appropriate constant B. From (62) and (67), we obtain 

minV [U T E] < -^- K 4 ((-3/2 + l)((-5 • 3~' + v){M' - 1) + B))^n 

q zlDL) 

= Ak 4 (M' -B)- 2 

N / 5 \ - 2 5 

A := ^{S-^^-v) . B := (5- 3~ 5 — v) B. (68) 

This (68) approximately shows the relationship between the optimized quantization error min 9 V [U T E] and the number of 
quantization levels. 



Example 4.2 Consider the following first-order FIR model for verifying the above results: 

y(t) = 9u{t), 



(69) 



where 6 = 2 and the system is noise free. We also generate 50 sets of I/O data sequences with a length N = 10, 000 for the 
system (69) that obey Assumption 4.1 and k — 4 (i.e., n y = 8). 

Next, quantize the output data y with the optimal quantizers given by Theorem 4. 1 and with uniform quantizers, for com- 
parison, subject to the constraints M' = 5 (M = 10). Fig 10 shows the step function q for y of the optimal quantizer for 
M' — 5. From the comparison with Fig 9, Fig 10 more clearly shows the property of the optimal quantizer, that is, it is coarse 
near the origin and becomes denser away from the origin. 

Table 2 shows comparison of the bias term jj J^tLi (t)e(t) and the quantization error term AE. From Table 2, the 
optimal quantizer, which minimize \/[U T E] attains a lower 1 1 A.Z?| 1 1 than those of the uniform quantizer. 

q(y) 



Fig. 10 Optimal quantization scheme Q opt for M' = 5 
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Table. 2 The ratios of the biases and the squares of errors for M' = 5 (averages of 50 sets) 



1 E t =T Mt)e(t)\ by Q opt / 1 £ t =T 0i(t)e(t)| by unif. quant. 


0.0135 


||A£||! by Q opt / ||A£||| by unif. quant. 


0.0116 







5 Resolution of the Quantizer and I/O Data Length 

In the system identification of (1), it is important to clarify the relationship between the estimation error and the amount of 
signal data used for the estimation. The amount of signal data is the resolution of the quantization multiplied by the length of 
signal sequence. Using the results in the previous sections, we evaluate the magnitudes of the error term AE and AW based 
on the approach in [25] and compare the effects of the resolution of quantizers and the length of signal sequence. 

First, the evaluation of the magnitude of (U^U)^ 1 . 

Lemma 5.1 [25] Assume that (j> satisfies Assumption 3.1 and 3.2 with V[0i(i)] = cr?^ V[0f (t)] = rj. Then, for any reliability 
index j3\ > 0, where 1 — /?i > 0, and er~ N — n^Jj^- {^/fj + (n — l)cr? ^ > 0, the following inequality is satisfied. 

Prob(||([7 T C/)- 1 || 1 > £1 ) <ft 

ei := ^r-± (70) 

Using Lemma 5.1, we evaluate || AE^oo in the following theorem. 

Theorem 5.1 For the system (1) with the optimal quantizer q(y) defined by (3) - (5), (33), assume Assumption 3.1 and 3.2. 
Then, for the reliability indices [3\, ft > 0, a length of data N and the number of quantization levels M, where 1 — /3i — /?2 > 0, 
and er~ N — n \Jj^[ + ( n ~ ^> G \ ) > ^' f ^ e following inequality asymptotically holds at Ay — > 0: 

Prob(||AE|| co <eie 2 ) >l-ft-ft (71) 



The proof is in Appendix A. 

From this theorem, we know that the convergence rate of the error term 1 1 AE\\ oo has an order of M~ 1 for sufficiently large 
M and of N~ i . Approximately, the total amount of information in the quantized output transmitted from identified systems to 
the observers is approximately N log 2 M =: JC using binary coding. Therefore, subject to a constraint of such a total amount 
of information, it is known that a large M is preferable to a large N to reduce the estimation error by observing: 

M- 1 N-t=M~ 1 ( — — ) 2 =/C-5 M - 1 (log 2 M)i M -^° 0. 
Vlog 2 MJ 

Of course, this is valid only for the error term || AE^ and the situation is different for the noise error term AW. We introduce 
the result for AW in the following proposition. 
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Proposition 5.1 [25] Assume that (f> satisfies Assumption 3.1 and 3.2 and w{t) is i.i.d. random variable with V[<f>x (t)] = <r? , 
and V[w(t)] — a^, respectively. Then, for reliability indices [3\, > 0, and a length of data N, where 1 — j3\ — [3 2 > 0, and 



This result shows that a large N is preferable for reducing AW. By combining Theorem 5.1 and Proposition 5.1, it can be 
seen that there exists a trade-off between AE and AW (also AE and AW) for reducing the total identification error subject 
to the constraint on the amount of information transmitted from the identified systems to the estimators. 

6 Conclusion 

In this paper, we show that the optimal quantizers for system identification can be derived analytically and their essential 
properties investigated with a simple FIR model. The results of this paper are summarized as follows: 

(1) General cases of the distribution of regressor vectors can be treated for high resolution quantizers by introducing the 
concept of the density of quantization subsections (Section 3). 

(2) The optimization problems in (1) are reduced to minimizations of functionals and the solutions can be found by solving 
Euler-Lagrange differential equations (Section 3). 

(3) When the regressor vector has a form of uniform distribution, the optimal quantization problem is reduced to a recursive 
minimization, which can be solved by a dynamic programming (Section 4). 

(4) In usual situation, the optimal quantizer is coarse near the origin of the output signals and tends to be dense away from 
the origin (Section 3 and Section 4). 

(5) Subject to a limitation on the total quantity of information in the quantized I/O data, there exists a trade-off between the 
magnitudes of the quantization error and noise error (Section 5). 

In this paper, we restrict the model to a SISO FIR model. For more realistic situations, we must extend the results to: a) 
ARX models, or MIMO systems, b) quantized input signal, and c) online system identification and adaptive control. These 
remain for future study. 
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A Appendix 



Slutsky's Theorem (e.g. [14]) 

For sequences of stochastic variables X(i), Y(i), assume that plim i _ >(X) [X(i)] and plim i _ >00 [Y(i)] converge to constants. 
Then, 

plim[X(i) _1 F(i)] = ( plim[X(i)] j plim[F(i)] 

holds. 

Proof of Lemma 3.1 

The outline of the proof is similar to that of Lemma 4.1 and we evaluate the value of: E fae(fa)fae(fa) 
cases in (23): a^b^c^d, a = b^c^d, a = b^c = d, a = b = c = d, and a = c^b = d (the other possible cases in 
(23) are essentially identical to these cases). 

Let S^ a , S^ b , S^ c , or S^ d be a quantized subsection of the axis of fa, (fit,, fa, or fa, respectively, and consider a subset 
S 4>a x S 4> b x x S*<* in the space of <f>. Moreover, let fa, fa, <j>' c , and fa be the quantized values, which are midpoints of 



for possible 



S* a , S^o, S* c , and S* d , respectively. The partial integral of E fae(fa)fae(fa) 



restricted to this subset is 



Isi 



fae(fa)fae(fa)f(fa, fa, fa, fa)dfadfadfadfa. 



Let 2 A<j) be the width of the largest side of the possible hyperrectangular parallelepiped regions in <p given by quantization, 
then, when a ^ b ^ c ^ d: 



i> a e{fa)fae{fa)f{fa,fa, fa, fa)dc/) a dfadfadfa 



Is 



fae(fa)fae(fa) 



S4>a XS*1 X5*c xS^d 

x + Yl - + 2 " ~ + ~ ~ ~ ] d ^dfadfadfa 

2 4 



i,' a fa5 bd ^Afa + O(Afa), 



(75) 



fae(fa)fae(fa)f(fa,fa,fa,fa)dfadfadfadfa = (fafa5 a d + (J)' c Sd)i^Afa + 0(Afa), (76) 

S*«xS*ixS*=xS*i 



and similarly, when a = b ^ c ^ d: 

I 

and when a = b ^ c = d: 

I 



S<t»*xS<t'bxS*<:xS< l >d 



fae(fa)fae(fa) f (fa, fa, fa, fa)dfadfadfadfa 



= (fafa6 ac + fa6 a + faS c + <5 O )^A0 8 + 0(A4> 9 ). 



(77) 
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Alternatively, when a = b = c = d: 



I 



4>ae{4> a )4> a e{4>a)f((t>a,4>b, 4>c, 0d)#a#fcd0 c d0d = 4>a 8 — A(f> 6 + 0{A(j) 7 ) 



and similarly, when a = c ^ b = d: 



j. 



4>ae(4>b)4>ae((f)b)f((i>a,4>b, 4>c 4>d)d4> a d(i>bd^cd4>d = 4>aS — A(i> 6 + 0(A(f> 7 ). 



(78) 



(79) 



The above show that, when Ac/) — > 0, the rate of convergence of (75) - (77) to is faster than that of (78) and (79). Therefore, 
we have the following: 



st=l 



Aj/ max ^0 



N E 



tie 2 (0i ) 



□ 



Derivation of eq. (28) 

(24) /N 



ii)' 0) + 3flj 1 



(i) 



1 „-l 



{o^^-e^fo^i^y^f^ + oiAcj?) 



hi) + z g o l 



(") 



(iii) 



E/- ' («i(W'or^) J,ff2 (W/(W^ 

j ^(0l)' 

*?E 
*?E 
*?E 



(j) + 2 9 j 1 



j J (^)' <3> -2s; 



-2„2 



ff (</» 1 )- 2 <7 2 (</» 1 )/(0 1 )# 1 +O(A0) 



where (0i)/j\ is the midpoint of S^ 1 , (i) is by Assumption 3.2.1, (ii) is by Assumption 3.2.2, and (iii) is by Assumption 3.2.1. 
□ 

Proof of Theorem 3.1 

The optimal solution can be given by using a similar technique to that in [1, 15]. With the calculus of variations, the following 
Euler-Lagrange equation: 



0. 



where 



gives a differential equation: 



and the solution is: 



dfa \dgj 8G 

<J — CO 

-1 (-2 S (0 1 )-V(0 1 )/(0 1 )) = 0, 
= Kai(4>i)f^(4>i), K : constant. 
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The constant number K is directly calculated by the condition (31), and the value of the objective function is derived as 
follows. 



^(^(0i)/*(0i))-V(^)/(^i)#i 



= J ^0?*rVf(0i)/*(0i)d0i = ^e{K- z D = -eirr \j 



□ 



Proof of Theorem 3.2 

We use a similar technique to that in [11, 2]. Let A be a Lagrange multiplier and consider the minimization of the following 
quantity. 

J ?{g{fa))dfa + XH d (f,g) = J (— 1-) <?{fa)f{!h) - A/(0i)log (g-\fa)) #i + XH d (f) 

^0 2 J(4>i) (g- 2 (4>i)o 2 (4>i) + Alo gff (0i)) + XH(f) 



By applying the calculus of variations, we obtain: 

^- (VV(<M + Alog 3 



-2.g 3 a- 2 (0i) + A.g x = constant. 



Fix the constant to be zero, then, 



9 = [ j ) <r(<h), 



and by substituting this for H(f, g), we obtain: 

H(f, g) = J -f log g- 1 fd4> 1 = log 



2\ 5 
A 



Therefore, 



- ] = cxp ( / / log 



-/log-t-#! =logM. 
o-(^i) 



+ logM 



and (41) is derived. By substituting g v for the objective integral, the following is derived. 



1 

12 



1 ~,A 1 ~, 



□ 



Proof of Lemma 4.1 

The left hand side of (51) is extended: 



N 



In (80), terms of the form E 



AT 



+ 2E 



N-l 



Mt)<Mt))Mt + iWMt + 1)) 



+ 



= NE [^e 2 ^)] + 2(N - 1)E [0 fc e(&)0 fc+1 e(&) 



(80) 



(j ) a^{4'b)4 l ce{(f)d) appear and in general, when (49) and (50) are satisfied, E ^ a e(^b)^ c e(^d) 



can be calculated according to the combinations of a, b, c and d as follows. 
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When a^b^c^d, 



4> a e(4>b)4>ce(4>d) = / (t> a e{fo)<t>ce{4>d)f{4>aAb,4>cAd)d4> a d4> b d4> c d4> < i 



j e((j> b )4> c e(4>d) {^J 0af($a,0b,<t>c,<j>d)d<f)a^j #fe# c #d ( = } J e(4>b)<f> c e{(f>d) x x d<f>bd<f> c d(f>d = 0, 



and similarly, when a = b ^ c ^ d, 



(j) a e((j) b )(j) c e((j) d ) = J (j) a e{(t)a)(l>ce{(t ) d)f{(t ) a 1 (j)c 1 (j)d)d(t)ad(j)cd(t)d 

J 4>ae(<l>a)e(<l>d) (^J ' $cf{$a,$b,$c,$d)d<t><^ d(j) a d<j) d ( = } J <j>a,e(4>a)e(4>d) X X # d</> d = 0, 



and when a = b ^ c = d, 



4>ae(4>b)4>ce(4>d) = / 4> a e{4>a)4>ce{4>c)f{4>a,4>c)d4> a d4> c 



4>ce{cj>c)f{4>a,(t>c)dcf> c ) d4> a (50)(l = (47)) 



y 4> a e{(j> a ) 



x x d(j> a = 0. 



On the other hand, there is no term when a = c ^ b ^ d or b = d ^ a ^ c in (80). Finally, when a — c,b = d, 



The other cases are essentially equivalent to one of the above cases (for example, a = d^= b ^ cis equivalent to a = b ^ c ^ 
d). 

From the above, it follows that: 



N 



= iVE 



□ 



Proof of Proposition 4.1 

Consider Sf = (0, d\] (equivalently Sf 1 on 0i) and <S| = (d\, d 2 ] (equivalently S 2 X on <j>i) where their boundaries d\, d 2 
have the relationship: 

di = nd 2 , n e [0, 1] (81) 

with an appropriate ratio n. The quantized values y'^ and for the subsections Sf on y (or Sf 1 on </>i) and 5| (or S^ 1 ) 
satisfying the bias-free condition: 



= 0, .7 = 1,2 



are given as follows. Let = -y + h\, where h\ is an offset from the center of Sf, then, 



4>i ■ e(</>i) 



fci 



r\d 2 



1 (1 



+ z I (z — /ii)- — rfz = — - — —kf — rid 2 hiki I , k\ := — 



2k 



and therefore, 



d rid2 D 



Similarly, let y'^ := ( 1+, ^ d2 + h 2 , where h 2 is the offset, then, 



st 1 

k 2 := 



</>i • e(<f>i) 
d 2 (l- ri ) 



k 2 



d 2 + rid 2 



1 



1 /2 



+ z ) (z — h 2 )- — dz = — - — I -k 2 — (d 2 + r\d 2 )h 2 k 2 I , 



2k 
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and therefore, 



h 2 = 7^2 



l(l-r!) : 



3 2 d 2 (l + n) 6 (l + n) 



By using these y'^ and y'^, the variances of c/>ie(0i) in each subsection can be calculated as follows. LetV ^ <r(0i)e(^i) 
denote the quantity: 



V 



,<S>i 



i)#i, 



where 



then, for even M, 



_ 2 1 

0- 2 (^i) = 0i + Kj( ( n _i) ) 



V 



tr(^i)e(0i) 



fei 

-fei 
1 1 
2160 2k~ 



2) + \{z-h{f^-dz 



nd 2 



-4 (32rf)+i--^4(n-l)^ r 3 



3' v 

L 

27 2k 



and similarly 



V 



a(^i)e(^i) 



*>2 

-fe 2 
1 1 

2160 2k. 



rf 2 (i + n) 



+ z) +-^(n-l)[>(z-^ 2 ) 2 — 



-d§ {-18(1 - n f + 45(1 + rx) 2 (i - n) 3 + 5(1 - n) 7 (1 + n)- 2 } 



_L_L 4( „_ 1)d 3{ 3(1 _ n) . + (^il! 



Therefore, the sum of V ^ 



o-(^i)e(^i) 



andV. 



■01 



V 



■<£l 



c(<Me(0i) 



o-(^i)e(<^i) 



cr(0i)e(^i) 
1 1 



is: 



2160 2k v 



(4V>(r-i; 32) + 20« 2 (n - l)d&(n; 4)) 



(82) 



V>(ri;32) := 32r? - 18(1 - n) 5 + 45(1 + n) 2 (l - n) 3 + 5(1 - n) 7 (l + n)" 2 , 



e(ri;4) := 4r? + 3(1 - n) J 



(1 + rO 2 



(83) 



The minimizer r° of this sum is given by: 



r\ = arg r min i] (d^( ri; 32) + 20K 2 (n-l)d^( ri; 4)) 

VC" := VW;32), 
C in := £(r?;4), 



and 













(v 


<r(0i)e(0i) 


+v 


fr(0i)e(^i) 


) 



1 1 

2160 2kT 



(^r n + 20K 2 (n-l)d 3 C in ) 



Note that the optimal r° is independent of the value of d 2 , which is the upper boundary of Sf ■ 

Next, we successively consider another subsection S3 on y (or S3 1 on cf>i) together with Sf (or Sf 1 ) and Sf (or iS* 1 ). 
Assume the relation between d 2 and d 3 is: 

d 2 = r 2 d 3 , 
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where r 2 is an appropriate number in [0, 1] . Similar to the case of Sf and <S|, the offset h 3 of y'^ for the subsection <S| on y 
(or S3 1 on 0i) satisfying E ^ ^ie(^i) = is 



1 l(l-r 2 ) 2 , , d 3 (l-r 2 ) 
d 3 , k 3 := 



h3 ~ 3^0 • (i (1+ ,-,) 



and V 



4>1 



a(0i)e(^i) 



can be given as 



V 



•=3 



a(^i)e(^i) 



A-3 



1 1 

2160 2^" 



d 3 (l + r 2 ) 



+ z) +-K 2 y (n-l)\(z-h 3 y—dz 



4 {-18(1 - r 2 f + 45(1 + r 2 ) 2 (l - r 2 ) 3 + 5(1 - r 2 ) 7 (l + r 2 y 2 } 



v 



(l + r 2 ) 2 



Therefore, the optimal r 2 that minimizes V c 
the following minimization problem: 



+ V 



o-(</>i)e(^i) 



is found by solving 



r 2 := argmin V fl cr(0i)e(0i) 

r 2 \ o 



= argmin ^o^" (4^2! + 20k> - l^fe C^)) 

V(r 2 ; < in ) := ^f a r 5 2 - 18(1 - r 2 ) 5 + 45(1 + r 2 ) 2 (l - r 2 f + 5(1 - r 2 ) 7 (l + r 2 )" 
e(r 2 ;C in ) ~ Cr^2+3(l-r 2 ) 3 + (J ~'- r 



(l + r 2 ) 2 ' 

By repeating the above process, we obtain the result. 

Lemma A.l A rational function 

ip(r) := ar 5 - 18(1 - r) 5 + 45(1 + r) 2 (l - rf + 5(1 - r) 7 (l + r)~ 2 
has only one local minimum in r € (0, 1) when a > 0. 

Refer to [24] for the proof. 
Slutsky's theorem 

plimpf^)- 1 !^)] = (plirii[A'(i)])- 1 plirn[y(i)] 

subject to that plim^^ [X(i)] and plim^^ [Y(i)] exist. 
Proof of Lemma 4.2 

From Lemma A.l, it is known that ip(r, ip™ ln = 32) has only one local minimum in r € (0, 1). Moreover, from 

V(0;a) = 32, Va > 0, ^(1;VH) = ^i, ^ = 32, 

ipf n < 32. 

^(0; vr in ) = 32 , ^(i;^r in ) = v>r in < 32, 



the minimum value ip™ m satisfies 
Next, ip(r; f/>™ m ) satisfies 



(84) 
□ 
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and also ip(r; ip™ m ) has only one local minimum in r e (0, 1). This means 

The difference between tp(r; ip™ m ) and ip(r; ^™ ln ) is only the coefficient of the term r 5 and r 5 is a strictly increasing function 
in (0, 1]. Therefore, with ^ in > VF in > 

r{ < r° 2 < 1. 

By repeating the same process, we finally obtain: 

r°i < r° 2 < r% < ■ ■ ■ < 1. 



Next to show lim^oo r° = 1. Let lim,-^ r° = r^. Then, satisfies: 



roo := arg min i/>(r;ip™ n ) 

r£[0,l] 



Note that if ip™ m > 0, VK 7 "! V'™ m ) a l so has only one local minimum in r e (0, 1). On the other hand, when ip™ ln = 0, it 
is also known that ip(r; V^ 1 ") i s a decreasing function in r £ [0, 1] from the proof of Lemma A.l and min r ip(r; ">p™ m ) = 
tp{l: ip™ in ). From (56), ^(1; ?A™ in ) = ip™ in , and the minimum is at r = 1. This means = 1 (and VC" = 0). □ 
Proof of Lemma 4.3 

On the subsections Sf 1 (<SJ) and (5j +1 ), i.e., the general case for (81) - (84), from: 



dj + dj+i 



+ z\(z- hj) dz = -k] - {dj + d j+1 )hjkj, 



the offsets hj and hj + \ such that E $ 1 0ie(^i) 



Oand E 



<j>ie(<j>i) = are given by: 



ft - 2 1 1,2 , ._ ~ 4? 



3 dj+i + dj +2 



On the other hand, V * (pie((j)i) 

O ■ 



V 



where 



is calculated by: 

k] ( dj +d i+1 



z ) (z - hjf dz = A (d j+1 - dj) 5 + B (dj + d 0+l f (d j+1 - dj) 3 



A :-- 



5 • 2 4 3 2 • 2 3 



< 0, B :-- 



3 • 2 4 



> 0. 



Therefore: 



V 



4>ie(4>i) 



+ V 



4>ie(4>i)\ = A(d j+1 - dj) 5 + B(d j+1 + dj) 2 {d j+1 - djf 

+ A{d j+2 - d 3+1 f + B(d J+2 + dj +1 f{dj +2 - d j+1 f 
=■ z (dj+i)- 



(85) 



For given dj and dj+2, consider which side the minimum point of Z(dj+i) is on from the center of dj and dj+2- From 
A < and B > and the symmetric structure of Z(dj + \), except for the terms (c£j+i + dj) 2 and (dj +2 + dj + i) 2 where 
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B(d j+1 + dj ) 2 < B(d j+2 + d j+1 ) 2 , it is known that Z(d j+1 ) has its minimum at d a > d i + ^+ 2 , This means {sf 1 \ > {sf^ \, 
that is, \Sj | > \Sj +1 |. The same applies for arbitrary sections S^ 1 and Sf+ lt and we can conclude the statement is true. □ 

Proof of Lemma 4.5 

From Lemma 4.2 and its proof, it is known that when j — > oo, r° and ip™ m converge to 1 and 0, respectively. Therefore, by 
employing the Taylor series expansion, ip(r; ipf-V) can represented by: 

^(r; = ^(1 - 5(1 - r) + 10(1 - rf - 10(1 - rf) + 45 • 2 2 (1 - rf + 0((1 - rf) 

near r = 1 at sufficiently large j. By applying a variable transformation 1 — r =: e, we obtain 

V>( e; V™) = - 5e + 10e 2 - 10e 3 ) + 180e 3 + (9(e 4 ) (86) 

at e — > 0. Denote the local minimum of ip(e; V>™i) as ej, then ej must satisfy: 

V'f-i (-5 + 20 Cj - 30e 2 ) + 540e 2 + 0(e 3 ) = 0. (87) 

From (87), it is simple to verify that: 

^ = (^-") 12 + ((^ m -") 1/2 ) (88) 
at tpj^l —> 0. On the other hand, from (86), \pf in is represented by: 

^min = ^min (1 _ ^ + ^2 _ ^3) + ^3 + (g9) 

and with (88), we obtain: 

= -5-3-i^ f +0{iPflf) =-.V{i>fi\). (90) 
With the convergence ?/;™ ln — > 0, we derive the statement of the lemma. □ 
Lemma A.2 ^>(m) > -0(m) atm = {), 1, ... , w/zera t/j(0) > V>(0). 

Proof First define ip'{m) for m e 7^, which is a simple linear interpolation of ip{m) at m = 0, 1, ... , and the gradient 
between i/>'(m — 1) and %j)'(m) (m = 1, 2, ...) is a constant "P(V>'(m - 1)) = aijj /b (m - 1) + o(?A' b (m — 1)) (< 0). Assume 
that ip(m) crosses ip'(m) downward at m = m! between m — 1 and m. Note that ip(m') < ?//(m — 1) = tp(m — 1), therefore, 



dip{m) 



dm 



{a + v)4>\m') > V(ip{m')) > T(^(m - 1)) = aip /b (m - 1) + o(ip' b (m - 1)). 



This contradicts the assumption ip(m') crosses -0'(m') downward. □ 
Proof of Theorem 5.1 

First evaluate the magnitude of U T E. From (28), (29), and (36), 

1 



E 



U L E 



= 0, V 



= —diD 6 M- A N. 
12 1 



Then by Chebyshev's inequality, we obtain: 



Prob (j|f/ T £|U > ^^^M^?j < p 2 , 
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for a reliability index /3 2 . Combine (U T U) 1 and U T E using the norm inequality: 
and this gives: 

Prob(||(^tf)- 1 i/- T .E|| 00 < eie 2 ) > Prob (\\(U T U)-% < e 1 and ||t/ T -E||oo < e 2 ) . 
Therefore we have proved the statements. □ 
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